1 Executive summary
1.1 Background
Since 2020, Covid-19 has spread widely around the world, and it has seriously affected the national order and people’s normal life in countries around the world. In order to better control the epidemic, countries have formulated different policies and updated policies in real time to deal with different situations.
1.2 Target Audience
Policy-making organizations in countries with serious epidemics and researchers who manage policies during the epidemic (which mainly focus on the Department of Health and Administration Department).
1.3 Aim
Help target audiences understand the relationship between the stringency index and new cases and the effectiveness of different national policies:
Whether restrictive policies could affect new cases.
How to better formulate policies and predict situations through policies.
1.4 Multidisciplinary Context
Our project combined 4 disciplines. The members who learned Data Science, Software Development mainly focus on programming and modeling; The memebers who learned Politics and Management are mainly responsable for analyzing and information collecting.
1.4.1 Agile development mode
We use the agile development mode we learned. At the beginning of the project, we discuss everyone’s previous disciplines and expertise and assign tasks. We set every two weeks as a sprint, and the whole project is divided into four sprints. Each sprint had a sprint planning meeting, sprint review meeting, and sprint retrospective meeting. In addition, we processed scrum daily meetings to monitor progress and solve problems.
1.4.2 Github
We established the Github repository to share our code and materials: Github link
1.5 Data Source
Covid_full Data Source: The data set is a collection of the COVID-19 data maintained by Our World in Data
Oxford Data Source: The data set is collected form the Oxford COVID-19 Government Response Tracker (OxCGRT)
2 Initial Data Analysis (IDA)
2.1 Data Set
The dimension of covid_full is 161909 x 67.
The size of oxford is 278779 x 61.
2.2 Data Input
The variable we used from the covid_full data set:
Location: Indicate the name of the country.
Date: Indicate the time.
Stringency_index: It represents the strict degree of government control. It is a composite measure based on 9 response indicators, rescaled to a value from 0 to 100 (100 = strictest response).
New_cases_per_million: It represents the new confirmed cases of COVID-19 per 1,000,000 people. The reason why we use new_cases_per_million is due to the total population of each country being different, this variable makes the comparison results to be more fair and accurate.
The variable we used from the oxford data set:
CountryName, Date, StringencyIndex: Same as the covid_full data set
C1-C8, H1: indicate the 9 response indicators of the stringency index including School closures, Workplace closures, Cancel public events, Restrictions on gatherings, Close public transport, Stay at home requirements, Restrictions on internal movement, International travel controls, and Public information campaigns respectively. The value means the strict level of the indicator.
3 Data Clean
3.1 Select the Top 20 countries with serious covid
We focused on the countries with serious epidemic situations in 2021. We selected the top 20 countries with the largest total cases between “2021-01-01” and “2021-12-31”. In addition, 2021 is the closest year to the current year which is the most representative in the data set and suitable for our topic.
Top 20 Countries:
“United States” “India” “Brazil” “United Kingdom” “Russia”
“France” “Turkey” “Germany” “Spain” “Iran”
“Italy” “Argentina” “Colombia” “Indonesia” “Poland”
“Mexico” “Ukraine” “South Africa” “Netherlands” “Philippines”
3.2 Clean the covid_full Data
For the covid_full data, we need to observe the information when the stringency index changed in each country. Hence, we extract the location, date, and stringency index each time the stringency index changes and calculate the sum of the new cases per million during the period with the same stringency index. After cleaning, we form a new data frame called "cleaned_covid".
3.3 Clean the Oxford Data
For the oxford data, we needed to observe the change in the 9 indicators. We also extract the same things as covid_full data and calculate the difference of the 9 indicators between the two stringency indexes, and store them as a list. After cleaning, we form another data frame called "cleaned_oxford".
For the prediction model, we combined the 9 indicators in the oxford data and the new_cases_per_million in the covid_full data by CountryName, Date and StringencyIndex. In order to change total_new_cases_per_million to be a factor, we calculated the median value of each of them. Then, set any value larger than the median value to 1 and store it in newcase_Outcome. Otherwise, stored as 0. Finally, we form a new data frame called "model_cleaned".
4 Research Questions
4.1 Q1: Find out whether the change in restrictive policies could affect new cases
4.1.1 Combined First Difference
Combined FD:
The first difference represents the difference between two consecutive adjacent terms. We use the first-order difference to view the changes between the data of two adjacent periods to observe the changes in stringency index and new cases.
#calculate the first difference of stringency_index
cleaned_covid$stringency_difference <- 0
most_diff <- data.frame()
max_stringency_diff <- c()
countr <- c()
for (i in countries){
index <- which ( cleaned_covid$country == i )
this_country <- cleaned_covid[ cleaned_covid$country == i , ]
difference <- diff( this_country$stringency)
temp_max <- max(difference,na.rm = TRUE)
max_stringency_diff <- append(max_stringency_diff, temp_max)
countr <- append(countr, i)
temp_frame <- data.frame(countr,max_stringency_diff)
most_diff <- rbind(most_diff, temp_frame)
cleaned_covid[index, ]$stringency_difference <- c(0, difference)
}
#calculate the first difference of total_new_cases_per_million
cleaned_covid$new_cases_difference <- 0
for (i in countries){
index <- which ( cleaned_covid$country == i )
this_country <- cleaned_covid[ cleaned_covid$country == i , ]
difference <- diff( this_country$total_new_cases_per_million)
cleaned_covid[index, ]$new_cases_difference <- c(0, difference)
}
#combine the FD of stringency_index and total_new_cases_per_million
cleaned_covid$new_cases_difference<-cleaned_covid$new_cases_difference/100
p_6 <- ggplot(data = cleaned_covid) +geom_line(aes(x = time, y = new_cases_difference, color = "new_cases_difference"))+ geom_line(aes(x = time, y = stringency_difference, color = "stringency_difference")) + facet_wrap( ~ country, scale = "free_y") + labs(title="Figure 1: New Cases vs Stringency FD", x ="Time", y = "Value") + theme(
plot.title = element_text(color="#99CCFF", size=14, face="bold"),
axis.title.x = element_text(color="#66B2FF", size=14, face="bold"),
axis.title.y = element_text(color="#66B2FF", size=14, face="bold")
)+scale_x_date(breaks = "3 month",labels=date_format("%m"))+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
plotly::ggplotly(p_6)
We combined the FD figures of stringency_index and total_new_cases_per_million and scaled the total_new_cases_per_million by dividing 100. Obviously, the countries with large fluctuations of stringency_index have a small total_new_cases_per_million changing range, such as Mexico, and the Philippines. On the contrary, relatively smooth changes of stringency_index have a relatively large change range of total_new_cases_per_million, like the United States and Spain.
Thus, it indicates that the government could adjust policies constantly based on the real-time situation to better control the new case and inflexible policies cannot control the epidemic.
4.1.2 Hierarchical Clustering
For further investigation, we use hierarchical clustering to assess grouping patterns. For clustering, Manhattan distance is used in both graphs. The Manhattan distance indicates the absolute wheelbase sum of two points in the standard coordinate system.
Before clustering, a distance matrix with a relative size of 20 X 20 was first created. Manhattan distance between each country was calculated using the Manhattan distance formula, and those with similar distances were classified into four categories in total. ggplot is then used to plot the country groups divided by the cluster.
Stringency Cluster:
# x is first time series, y is second time series
#manhattan distance calculation
manhattan_dist <-
function(x, y, p){
distance = sum((x - y), na.rm = TRUE)
return(distance)
}
# Hierarchical clustering for stringency
options(warn = -1)
p = 2
covid_list = split.data.frame(cleaned_covid[,c("time","stringency")], cleaned_covid$country)
n = length(countries)
distance_matrix <- matrix(0, n, n)
dateindex = covid_list[[1]]$time
for (i in 1:n ){
for (j in 1:n){
index_i = match(covid_list[[i]]$time, dateindex)
index_j = match(covid_list[[j]]$time, dateindex)
ts_i <- covid_list[[i]][index_i,"stringency"]
ts_j <- covid_list[[j]][index_j,"stringency"]
distance_matrix[i,j] <- manhattan_dist(ts_i, ts_j, p)
}
}
rownames(distance_matrix) <- colnames(distance_matrix) <- countries
distance_matrix[!is.finite(distance_matrix)] <- 0
matrix_dist <- as.dist(distance_matrix)
hclust_res <- hclust( matrix_dist, method = "ward.D")
hclust_cluster <-
cutree(hclust_res, k = 4) %>% as.factor %>% as.data.frame
# draw the line chart basd on the stringency cluster
cleaned_covid$cluster <-hclust_cluster[as.character(cleaned_covid$country), 1]
p_2 = ggplot() +
geom_line(data =cleaned_covid, aes(x = time, y = stringency, color = country)) +
facet_wrap( ~ cluster,scales = "fixed", ncol = 1) +
theme_bw() + labs(title="Figure 2: Stringency Index Change Trend", x ="Time", y = "Stringency Index") + theme(
plot.title = element_text(color="#99CCFF", size=14, face="bold"),
axis.title.x = element_text(color="#66B2FF", size=14, face="bold"),
axis.title.y = element_text(color="#66B2FF", size=14, face="bold")
)
plotly::ggplotly(p_2)
As you can see from the stringency_index line chart, most countries maintained highly stringency_index until July 2021 and then began to decline slowly over time, such as France, Colombia, and Spain. For example, France announced in early July that masks would not be mandatory in nightclubs. In addition, children will be allowed to return to school when classes begin in September[1].
Some countries, such as Poland, the UK, and the US, had the highest levels of state control in January 2021. Policy controls in the UK dropped sharply around May, but after a ban on courses for almost a year, the change left almost all UK schools open by the end of May 2021, with stay-at-home orders lifted across the country in mid-April[2]. For the United States, the severity of the closure and containment indicators decreased in 2021, with the sharp decline in policy severity occurring in March 2021[3].
In addition, some countries, such as India and Mexico, are classified into one category due to relatively frequent fluctuations, indicating that the degree of policy control in these countries has been adjusted more times.
New Cases Cluster:
#scale the total_new_cases_per_million
library("scales")
trend <- cleaned_covid
trend$total_new_cases_per_million <- rescale(trend$total_new_cases_per_million, to = c(0, 1))
# Hierarchical clustering for total_new_cases_per_million
options(warn = -1)
p = 2
covid_list = split.data.frame(trend[,c("time","total_new_cases_per_million")], trend$country)
n = length(countries)
distance_matrix <- matrix(0, n, n)
dateindex = covid_list[[1]]$time
for (i in 1:n ){
for (j in 1:n){
index_i = match(covid_list[[i]]$time, dateindex)
index_j = match(covid_list[[j]]$time, dateindex)
ts_i <- covid_list[[i]][index_i,"total_new_cases_per_million"]
ts_j <- covid_list[[j]][index_j,"total_new_cases_per_million"]
distance_matrix[i,j] <- manhattan_dist(ts_i, ts_j, p)
}
}
rownames(distance_matrix) <- colnames(distance_matrix) <- countries
distance_matrix[!is.finite(distance_matrix)] <- 0
matrix_dist <- as.dist(distance_matrix)
hclust_res <- hclust( matrix_dist, method = "ward.D")
hclust_cluster <-
cutree(hclust_res, k = 4) %>% as.factor %>% as.data.frame
#draw the line chart based on new cases cluster
trend$cluster <-hclust_cluster[as.character(trend$country), 1]
p_4 = ggplot() +
geom_line(data =trend, aes(x = time, y = total_new_cases_per_million, color = country)) +
facet_wrap( ~ cluster,scales = "fixed", ncol = 1) +
theme_bw() + labs(title="Figure 3: New Cases Change Trend", x ="Time", y = "New Cases per Million") + theme(
plot.title = element_text(color="#99CCFF", size=14, face="bold"),
axis.title.x = element_text(color="#66B2FF", size=14, face="bold"),
axis.title.y = element_text(color="#66B2FF", size=14, face="bold")
)
plotly::ggplotly(p_4)
According to the clustering results of New Cases, the United States is divided into one category independently due to its large fluctuation within a year. Although some countries, such as the UK and Poland, still have cases grouped, the extent of the UK policy mentioned above shows that relevant restrictions will be lifted. While wearing masks is still mandatory, the number of new cases in the UK began to rise sharply in mid-May.
Although France remains a good example, it is grouped with India and Italy because of a more subdued increase in cases until August. On the other hand, France saw a sharp rise in new cases in August and continued to rise until January 2022, largely because of the lifting of restrictions after July.
By comparing the two cluster line charts, we can realize that the degree of policy control in each country is closely related to the increase of cases, and the loosening of policies is likely to lead to a rapid increase in cases. In addition, the United States was grouped with countries such as the United Kingdom in the first line chart but was separately classified in the case chart. This shows that although the policy of the United States is roughly the same as that of these countries, the policy of the United States is not very efficient compared with similar countries. Therefore, the audience (policymakers in each country) should not always learn the control policies of other countries but should start from their own country and formulate related policies according to the actual situation of their own country to change the degree of the policy control.
4.2 Q2: How to better formulate policies and predict cases level through policies
4.2.1 Map(lag time)
Lag time calculation:
The “lag time” is the difference between the time from the stringency_index starts to increase until the total_new_cases_per_million start to decrease and record the two dates as "start_date" and "end_date" of the lag time respectively, and calculated the differences between two dates represented in days. Furthermore, we compared the efficiency of policies in different counties by finding the longest lag time in each country and storing it in the lag_data data frame as “max lag time” with the corresponding country and start date.
#calculate the lag time of each country
lag_data <- data.frame()
id <- 1
for (i in 1:length(countries)) {
lag_location = countries[i]
lag_countries <- cleaned_covid %>% dplyr::filter(country == lag_location)
start <- NULL
start_date = NULL
end_date = NULL
max_lag_difftime = 0
j = 1
while (j < nrow(lag_countries)){
temp_index <- lag_countries[j,]$stringency
if(!is.na(temp_index)){
if( j != nrow(lag_countries)){
if (lag_countries[j,]$stringency < lag_countries[j+1,]$stringency) {
start_date = lag_countries[j+1,]$time
for (k in j+1:nrow(lag_countries)-1) {
temp_case <- lag_countries[k,]$total_new_cases_per_million
temp_case_next <- lag_countries[k+1,]$total_new_cases_per_million
if(!is.na(temp_case) && !is.na(temp_case_next) && length(temp_case) != 0 && length(temp_case_next) != 0){
if (lag_countries[k,]$total_new_cases_per_million > lag_countries[k+1,]$total_new_cases_per_million) {
end_date = lag_countries[k+1,]$time
lag_difftime = difftime(end_date, start_date, units = "days")
if (lag_difftime >= max_lag_difftime) {
max_lag_difftime = lag_difftime
start <- start_date
}
start_date = end_date
j = k
break
}
} else{
break
}
}
j = j + 1
}
}else{
break
}
}else{
break
}
j = j + 1
}
sub_lag_data <- data.frame(id,countries[i], max_lag_difftime, start)
lag_data <- rbind(lag_data, sub_lag_data)
id <- id + 1
}
names(lag_data)[2] <- "Country"
Lag Map:
#read the 2014_world_gdp_with_codes.csv and combine with the lag_data
word_geo <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/2014_world_gdp_with_codes.csv")
world_merge<- merge(word_geo, lag_data,
by.x = "COUNTRY", by.y = "Country",
all.x = TRUE)
#use the national geographic location data in the word_geo and the lag value in lag_data to draw the map
world_map <- plot_ly(world_merge, type='choropleth', locations=world_merge$CODE, z=world_merge$max_lag_difftime, text=world_merge$COUNTRY, colorscale="RdBu")%>%layout(title = 'Figure 4: COVID19: The max lag time in different countries', font=t, plot_bgcolor = "#e5ecf6")
world_map
A geographic data was combined with the cleaned max lag time data to show the max lag time in different countries. The max lag time values ranging from small to large are represented by colors ranging from cool to warm (blue to red). From the map, there are several countries in orange and red on the map, such as Indonesia, the Philippines, and Iran, which indicates that these countries have a relatively large max lag time. In contrast, in South Africa, France, and Italy, they may use less than 40 days to decrease new cases after the stringency index increases.
This map shows that the policies may have effects on decreasing the number of total new cases, but the timing of the policies’ effects varies from country to country and different policies have different effects on the new cases.
4.2.2 Indicators Difference
The indicators difference represents the changed indicators in each country when the lag time is largest which means the policies changed.
For countries with states information, we choose the state with the highest average value to represent the change of the indicator in that country which is more representative.
#find the Indicators Difference of the max lag time in each country
policy <- data.frame()
for (i in 1:nrow(lag_data)){
max_mean_row <- 1
max_mean <- 0
specific_date = lag_data[i,]$start
specific_country = lag_data[i,]$Country
single_data <- cleaned_oxford %>% dplyr::filter(time == specific_date,country == specific_country)
for ( j in nrow(single_data)){
cur_mean <- mean(unlist(single_data$elements_diff[[j]]))
if(cur_mean > max_mean){
max_mean <- cur_mean
max_mean_row <- j
}
}
theRow <- single_data$elements_diff[[max_mean_row]]
temp <- data.frame(specific_country,theRow[1],theRow[2],theRow[3],theRow[4],theRow[5],theRow[6],theRow[7],theRow[8],theRow[9])
policy <- rbind(policy, temp)
}
names(policy)[1] <- "Country Name"
names(policy)[2] <- "C1"
names(policy)[3] <- "C2"
names(policy)[4] <- "C3"
names(policy)[5] <- "C4"
names(policy)[6] <- "C5"
names(policy)[7] <- "C6"
names(policy)[8] <- "C7"
names(policy)[9] <- "C8"
names(policy)[10] <- "H1"
policy %>%
kbl(caption = "Table 1: Indicators Difference") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),fixed_thead = T)%>%kable_paper(full_width = F) %>%
column_spec(1, bold = T, color = "white", background = "#1F456E")
| Country Name | C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | H1 |
|---|---|---|---|---|---|---|---|---|---|
| South Africa | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 |
| France | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
| Italy | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| Poland | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Brazil | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Turkey | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| Argentina | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Mexico | 0 | 1 | 1 | 4 | 0 | 1 | 0 | 0 | 0 |
| Germany | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| Spain | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| United States | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Russia | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Ukraine | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| Netherlands | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Colombia | 0 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 0 |
| India | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
| United Kingdom | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| Iran | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| Philippines | 1 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
| Indonesia | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
By comparing the specific measures of the country stringency index above, those countries with shorter declines in new cases have imposed higher-level restrictions on the activities and venues for large-scale gatherings and movements of people. In addition, “Stay at home requirements” and “Restrictions on internal movement” can reduce the rise in infection rates even at relatively low levels, reducing new cases to a certain extent (Wibbens, Koo, & McGahan, 2020).
Our target audience may consider effective measures according to their own national conditions to promote the decline of new cases in their own countries.
4.3 Prediction Model
Five countries(Indonesia, Philippines, Iran, United Kingdom, India) with larger max lag time were used to make predictions and changed their indicator levels to observe changes in total new cases. As the policies of these countries might not have a significant impact on reducing new cases, which means these policies can better correspond to the levels of newcase_Outcome, and has a relatively more accurate and stable prediction. Otherwise, there is no research significance in the countries where a slight change in stringency index can easily change the newcase_Outcome.
#select the data of five countries
model_data <- data.frame()
model_country<- c("Indonesia", "Philippines",'Iran', "United Kingdom","India")
for( c in model_country){
single_country <- model_cleaned %>% dplyr::filter(country == c)
model_data <- rbind(model_data, single_country)
}
model_data
Performed RandomForest respectively on the predictive model with 5-fold cross-validation and repeated 25 times to test accuracy.
#extract the train data and test data
single_predict <- model_data[,c(4,5,6,7,8,9,10,11,15)]
na.omit(single_predict)
single_predict <- single_predict[complete.cases(single_predict), ]
model_scaled = single_predict %>% mutate(newcase_Outcome = factor(newcase_Outcome)) %>% mutate_if(is.numeric, .funs = scale)
X = model_scaled %>% select(-newcase_Outcome) %>% scale()
y = model_scaled %>% select(newcase_Outcome) %>% pull()
#perform the 5-fold cross-validation and repeat 25 times to test accuracy
cvK = 5 # number of CV folds
cv_50acc5_rf = c()
cv_acc_rf = c()
n_sim = 25 ## number of repeats
for (i in 1:n_sim) {
cvSets = cvTools::cvFolds(nrow(X), cvK) # permute all the data, into 5 folds
cv_acc_rf = c()
for (j in 1:cvK) {
test_id = cvSets$subsets[cvSets$which == j]
X_test = X[test_id, ]
X_train = X[-test_id, ]
y_test = y[test_id]
y_train = y[-test_id]
## RandomForest
rf_res <- randomForest::randomForest(x = X_train, y = as.factor(y_train))
fit <- predict(rf_res, X_test)
cv_acc_rf[j] = mean(fit == y_test)
}
cv_50acc5_rf <- append(cv_50acc5_rf, mean(cv_acc_rf))
} ## end for
# boxplot(list(RF= cv_50acc5_rf ), ylab="CV Accuracy")
acc_frame <- data.frame(cv_50acc5_rf)
p_7 <- ggplot(acc_frame,aes(y = cv_50acc5_rf)) +
geom_boxplot()+labs(title="RandomForest Accuracy", y = "Accuracy")+ theme(
panel.background = element_rect(fill = "lightblue",
colour = "lightblue",
size = 0.5, linetype = "solid"),
panel.grid.major = element_line(size = 0.5, linetype = 'solid',
colour = "white"),
panel.grid.minor = element_line(size = 0.25, linetype = 'solid',
colour = "white")
)
The accuracy of RandomForest is close to 0.7 and shows that the RandomForest model could roughly predict whether the implementation levels of these nine policies will affect the new situation. The accuracy is not so high due to many confounders that may affect newcase_Outcome.
5 Shiny
Link Here: Shiny app link
Link Here: Shiny app detailed code
5.1 Preparation
We have three different files for constructing a server, UI, and a global environment. First, the global file contains objectives that can be used by the server. Second, the UI file defines the overall framework of the shiny app. UI is mainly responsible for the front-end display of web pages, including defining the layout of web pages, defining static HTML display information, defining the HTML controls that receive user input, and defining the HTML controls that output information when responding to users. In addition, the server file contains a function that is responsible for processing user input and returning response information. The function can take at least two parameters (input and output), which correspond to user input and web page output.
5.2 Deployment
In the global file, we process all data that will be needed in the server. In the UI file, we define all user input. They are country selection, date range selection, and multiple checkbox groups. The country selection gives users the ability to choose the country they want to inspect, same for date range selection. Regarding the checkbox group, we give users a choice of pre-selected countries that have special attributes (high/low incidence). We create six different tabs including one “Read Me” page which contains some relevant information about our shiny app. For the output, since we have user inputs, we need to set different conditions on how our outputs will be generated based on what users want. To achieve this, we embed different if clauses for each plot output. Moreover, in order to make plots interactive, we change the function to be plotly output in the UI file and then use ggplotly() in the server file. In addition, we use an HTML function to embed a link inside our shiny app so users can be redirected to another page if they are interested. In order to make our shiny app fancy, we also include one picture on the “Read Me” page by using div(img()) function. We use the span() function in ui.R in order to customize the font’s color, style, and size. As stated in read me, charts in the first three tabs (except “Read me”) are all interactive and linked with user inputs. Users can specify the country and time range or they can choose the country we provided. For the world map, if you move your mouse, it will show some related information (e.g. country information). The last section is a combined line chart that compares the stringency index and new case for each country.
5.3 Usability
This shiny app aims to help users to have a better understanding of how new case trends and stringency index trends change as our target audience is policymakers. They can compare differences between each country and we hope it can help them to make the policy easier. Our target audience can use this app to see whether changes in the stringency_index have influenced to new case trend. For example, the United States and the United Kingdom have relatively the same stringency index but the new case trend of those two countries are not similar. The policymaker may realize that the stringency index is not the only factor that influences new cases. Therefore, how to adjust stringency is really depends on different countries.
6 Limitation
1 Policy Variables Limitation
There are only 9 main restrictive policies considered in the calculation of the stringency index. However, many other policies implemented in the actual situation may affect results, such as the vaccination policy.
2 Factors Limitation
Various factors may affect cases increase, such as the response of people and economic assistance. Those confounders may affect the accuracy of the predicted model and the effectiveness of policies.
3 Time Limitation
Our research was mainly focused on the information in the past year 2021. One year’s data may not be enough to get more accurate results.
4 Country Limitation
The prediction model only uses a small amount of national data. Although it can avoid the impact of unreasonable data, ignoring the situation of other countries may affect the accurate results.
5 Difference Calculation of Indicators
The state with the highest average difference may not represent the difference in the country. But it represents the strictest implementation in the country.
6 Newcase Outcome
Using the median value to divide total_new_cases_per_million into two levels may be inaccurate due to the large gap that may occur at the same level.
7 Conclusion
In conclusion, the result of our research found that the change in government response has an effect on controlling the number of new cases; The effectiveness of improving multiple levels of a policy is higher than just increasing one level of multiple policies; Under the same increased level of nine policies, the effectiveness of internal flow control is higher; the RandomForest model could roughly predict new cases situation. However, our results may not be completely accurate due to the limitations mentioned above may influence the validity of the obtained researched results.
In future work, we will create a task board and burndown chart when we use an agile development model to monitor processes. In Data Science, we will collect more relevant data and continue to explore the effectiveness of restrictive policies. In Politics, we continue to provide our target audience with more specific and accurate policy-making methods and prediction models, to help countries with serious epidemics control the development of the epidemic.
8 Contribution
Everyone contributed to the report writing. Moreover, we chose a scrum master and a product owner, and divided the jobs into a materials collector, two analysts, and two programmers based on the expertise of every member.
Katherine An: Scrum master and analysts; She acts as the Scrum Master in the project and is responsible for the Scrum process, ensuring that the team uses Scrum correctly, helping the team to simplify the process and achieve the project goals. In addition, she serves as an analyst for the team. Completed the code and analysis of the lag time calculation in Research Question 2, the drawing and analysis of the lag time map, and the analysis of the prediction model. She attended all group meetings and participated in the preparation of the report throughout.
Kang Fu: Materials collector; He attended most of the meetings, listened to everyone’s opinions to fix work. He is mainly responsible for searching and collecting additional data sets to find data more suitable for our project research; He is also responsible for adjusting and beautifying the template style of PPT and report. Furthermore, he edited the report.
Yachao Zhang: Analysts; She attends weekly meetings and advises on the topic and audience. In the whole report, she was responsible for coding, drawing and analyzing the main problem part I, and drawing the severity index and main conclusions of new cases through investigating the background information.
Yuxuan Qin: Programmer; He constructed and designed the overall Shiny app framework and emended teammates’ work into shiny with an appropriate format. He wrote all code of the shiny app and deployed the final product in both local and online environments properly. He also helped teammates with doing some data cleaning in the oxford dataset and contributed to write final report.
Yunshuo Zhang: Product owner and programmer; He was responsible to organize meeting and assign tasks to team members. He mainly focus on programming and debugging. He did the data clean for covid_full and oxford data set to form three data frames. Generating the indicators difference table, prediction model and combined FD figures. Also, he was help in updating map feature and organizing report.
9 Reference
FRANCE 24.(2021,September 02).French children return to school for new academic year amid stringent Covid-19 rules.Retrieved from: https://www.france24.com/en/france/20210902-french-children-return-to-school-for-new-academic-year-amid-stringent-covid-19-rules
Helen Tatlow, Emily Cameron-Blake, Sagar Grewal, Thomas Hale, Toby Phillips, Andrew Wood, (2021) “Variation in the response to COVID-19 across the four nations of the United Kingdom.” Blavatnik School of Government Working Paper. Retrieved from:: www.bsg.ox.ac.uk/covidtracker
Laura Hallas, Ariq Hatibie, Rachelle Koch, Saptarshi Majumdar, Monika Pyarali, Andrew Wood, Thomas Hale. “Variation in US states’ responses to COVID-19 3.0.”
OxCGRT/covid-policy-tracker. (2021, September 27). Retrieved from GitHub website: https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/index_methodology.md
OXFORD COVID-19 Government Response Stringency index - Humanitarian Data Exchange. (n.d.). Retrieved May 22, 2022, from data.humdata.org website: https://data.humdata.org/dataset/oxford-covid-19-government-response-tracker?
Ritchie, H., Mathieu, E., Rodés-Guirao, L., Appel, C., Giattino, C., Ortiz-Ospina, E., … Roser, M. (2020). Coronavirus Pandemic (COVID-19). Our World in Data. Retrieved from https://ourworldindata-org.translate.goog/coronavirus?_x_tr_sl=en&_x_tr_tl=zh-CN&_x_tr_hl=zh-CN&_x_tr_pto=sc#citation
Shiny - Application layout guide. (2021, January). Rstudio. Retrieved from: https://shiny.rstudio.com/articles/layout-guide.html
Wibbens, P. D., Koo, W. W.-Y., & McGahan, A. M. (2020). Which COVID policies are most effective? A Bayesian analysis of COVID-19 by jurisdiction. PLOS ONE, 15(12), e0244177. https://doi.org/10.1371/journal.pone.0244177
Appendix
1. Boxplot of Accuracy
plotly::ggplotly(p_7)
2. Package version
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.1252
## system code page: 936
##
## attached base packages:
## [1] tools stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] zoo_1.8-9 viridis_0.6.2 viridisLite_0.4.0
## [4] tuneR_1.3.3.1 forcats_0.5.1 stringr_1.4.0
## [7] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
## [10] tidyverse_1.3.1 tsfeatures_1.0.2 tibble_3.1.6
## [13] scales_1.1.1 sp_1.4-6 RSpectra_0.16-0
## [16] R.utils_2.11.0 R.oo_1.24.0 R.methodsS3_1.8.1
## [19] reshape2_1.4.4 readxl_1.3.1 proxy_0.4-26
## [22] pheatmap_1.0.12 plotly_4.10.0 maps_3.4.0
## [25] janitor_2.1.0 limma_3.50.1 kableExtra_1.3.4
## [28] knitr_1.37 GEOquery_2.62.2 Biobase_2.54.0
## [31] BiocGenerics_0.40.0 ggplot2_3.3.5 ggthemes_4.2.4
## [34] DT_0.21 dplyr_1.0.8 devtools_2.4.3
## [37] usethis_2.1.5 class_7.3-20 cvTools_0.3.2
## [40] robustbase_0.93-9 lattice_0.20-45 crosstalk_1.2.0
##
## loaded via a namespace (and not attached):
## [1] backports_1.4.1 systemfonts_1.0.4 plyr_1.8.6
## [4] lazyeval_0.2.2 digest_0.6.29 htmltools_0.5.2
## [7] fansi_1.0.2 magrittr_2.0.2 memoise_2.0.1
## [10] cluster_2.1.2 tzdb_0.2.0 remotes_2.4.2
## [13] modelr_0.1.8 xts_0.12.1 svglite_2.1.0
## [16] forecast_8.16 rmdformats_1.0.3 tseries_0.10-49
## [19] prettyunits_1.1.1 colorspace_2.0-3 signal_0.7-7
## [22] rvest_1.0.2 haven_2.4.3 xfun_0.29
## [25] callr_3.7.0 crayon_1.5.0 jsonlite_1.8.0
## [28] glue_1.6.2 gtable_0.3.0 webshot_0.5.3
## [31] V8_4.1.0 pkgbuild_1.3.1 quantmod_0.4.18
## [34] DEoptimR_1.0-10 DBI_1.1.2 randomcoloR_1.1.0.1
## [37] Rcpp_1.0.8 htmlwidgets_1.5.4 httr_1.4.2
## [40] RColorBrewer_1.1-2 ellipsis_0.3.2 farver_2.1.0
## [43] pkgconfig_2.0.3 nnet_7.3-17 sass_0.4.1
## [46] dbplyr_2.1.1 utf8_1.2.2 labeling_0.4.2
## [49] tidyselect_1.1.2 rlang_1.0.1 munsell_0.5.0
## [52] cellranger_1.1.0 cachem_1.0.6 cli_3.2.0
## [55] generics_0.1.2 broom_0.7.12 evaluate_0.15
## [58] fastmap_1.1.0 yaml_2.3.5 processx_3.5.2
## [61] fs_1.5.2 randomForest_4.7-1 nlme_3.1-155
## [64] xml2_1.3.3 brio_1.1.3 compiler_4.1.2
## [67] rstudioapi_0.13 curl_4.3.2 testthat_3.1.2
## [70] reprex_2.0.1 bslib_0.3.1 stringi_1.7.6
## [73] highr_0.9 ps_1.6.0 desc_1.4.0
## [76] Matrix_1.4-0 urca_1.3-0 vctrs_0.3.8
## [79] pillar_1.7.0 lifecycle_1.0.1 lmtest_0.9-39
## [82] jquerylib_0.1.4 data.table_1.14.2 R6_2.5.1
## [85] bookdown_0.26 gridExtra_2.3 sessioninfo_1.2.2
## [88] MASS_7.3-55 assertthat_0.2.1 pkgload_1.2.4
## [91] rprojroot_2.0.2 withr_2.4.3 fracdiff_1.5-1
## [94] parallel_4.1.2 hms_1.1.1 quadprog_1.5-8
## [97] grid_4.1.2 timeDate_3043.102 rmarkdown_2.14
## [100] snakecase_0.11.0 Rtsne_0.15 TTR_0.24.3
## [103] lubridate_1.8.0